The Bayesian Sorting Hat

نویسندگان

  • Justin D. Silverman
  • Rachel Silverman
چکیده

Size-constrained clustering (SCC) refers to the dual problem of using observations to determine latent cluster structure while at the same time assigning observations to the unknown clusters subject to an analyst defined constraint on cluster sizes. While several approaches have been proposed, SCC remains a difficult problem due to the combinatorial dependency between observations introduced by the size-constraints. Here we reformulate SCC as a decision problem and introduce a novel loss function to capture various types of size constraints. As opposed to prior work, our approach is uniquely suited to situations in which size constraints reflect and external limitation or desire rather than an internal feature of the data generation process. To demonstrate our approach, we develop a Bayesian mixture model for clustering respondents using both simulated and real categorical survey data. Our motivation for the development of this decision theoretic approach to SCC was to determine optimal team assignments for a Harry Potter themed scavenger hunt based on categorical survey data from participants.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Sorting Hat Goes to College

In the Harry Potter stories [R], each new year at the Hogwarts School for Witchcraft and Wizardry starts with the ceremonial assignment of the new first-year students to one of the four houses: Griffindor, Hufflepuff, Ravenclaw, and Slytherin. This is a milestone for the young students because their assigned houses greatly influence their future direction. The crucial assignment process is entr...

متن کامل

Fast Construction of a Word-Number Index for Large Data

The paper presents a work still in progress, but with promising results. We offer a new method of construction of word to number and number to word indices for very large corpus data (tens of billions of tokens), which is up to an order of magnitude faster than the current approach. We use HAT-trie for sorting the data and Daciuk’s algorithm for building a minimal deterministic finite state aut...

متن کامل

A Bayesian Analysis of HAT-P-7b Using the EXONEST Algorithm

The study of exoplanets (planets orbiting other stars) is revolutionizing the way we view our universe. High-precision photometric data provided by the Kepler Space Telescope (Kepler) enables not only the detection of such planets, but also their characterization. This presents a unique opportunity to apply Bayesian methods to better characterize the multitude of previously confirmed exoplanets...

متن کامل

Numerical and Experimental Investigation of the Effect of Different Orientation Angles on Crash Behavior of Composite Hat Shape Energy Absorber

Car body lightening and crashworthiness are two important objectives of car design. Due to their excellent performance, composite materials are extensively used in the car industries. In addition, reducing the weight of vehicle is effective in decreasing the fuel consumption. Hat shape energy absorber is used in car’s doors for side impact protection. The aim of these numerical models and expe...

متن کامل

A Non-parametric Bayesian Framework for Spike Sorting Using Optimal Quantization

This paper describes an approach that performs spike sorting by a nonparametric density estimation technique under a Bayesian framework. The technique is based on an optimal quantization method. We performed experiments on simulated and real spike signals. The results are comparable with what is reported in the literature.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017